GMM-Free Flat Start Sequence-Discriminative DNN Training

نویسندگان

Gábor Gosztolya

Tamás Grósz

László Tóth

چکیده

Recently, attempts have been made to remove Gaussian mixture models (GMM) from the training process of deep neural network-based hidden Markov models (HMM/DNN). For the GMM-free training of a HMM/DNN hybrid we have to solve two problems, namely the initial alignment of the frame-level state labels and the creation of context-dependent states. Although flat-start training via iteratively realigning and retraining the DNN using a frame-level error function is viable, it is quite cumbersome. Here, we propose to use a sequencediscriminative training criterion for flat start. While sequencediscriminative training is routinely applied only in the final phase of model training, we show that with proper caution it is also suitable for getting an alignment of context-independent DNN models. For the construction of tied states we apply a recently proposed KL-divergence-based state clustering method, hence our whole training process is GMM-free. In the experimental evaluation we found that the sequence-discriminative flat start training method is not only significantly faster than the straightforward approach of iterative retraining and realignment, but the word error rates attained are slightly better as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gaussian free cluster tree construction using deep neural network

This paper presents a Gaussian free approach to constructing the cluster tree (CT) that context dependent acoustic models (CDAM) depend on. Over the last few years deep neural networks (DNN) have supplanted Gaussian mixture models (GMM) as the default method for acoustic modeling (AM). DNN AMs have also been successfully used to flat start context independent (CI) AMs and generate alignments on...

متن کامل

Improving Computer Lipreading via DNN Sequence Discriminative Training Techniques

Although there have been some promising results in computer lipreading, there has been a paucity of data on which to train automatic systems. However the recent emergence of the TCDTIMIT corpus, with around 6000 words, 59 speakers and seven hours of recorded audio-visual speech, allows the deployment of more recent techniques in audio-speech such as Deep Neural Networks (DNNs) and sequence disc...

متن کامل

Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition

We propose an algorithm that allows online training of a context dependent DNN model. It designs a state inventory based on DNN features and jointly optimizes the DNN parameters and alignment of the training data. The process allows flat starting a model from scratch and avoids any dependency on a GMM acoustic model to bootstrap the training process. A 15k state model trained with the proposed ...

متن کامل

Joint adaptation and adaptive training of TVWR for robust automatic speech recognition

Context-dependent Deep Neural Network has obtained consistent and significant improvements over the Gaussian Mixture Model (GMM) based systems for various speech recognition tasks. However, since DNN is discriminatively trained, it is more sensitive to label errors and is not reliable for unsupervised adaptation. Moreover, DNN parameters do not have a clear and meaningful interpretation, theref...

متن کامل

Gmm-free Dnn Training

While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture models (GMMs) for alignments both for supervised training and for context dependent (CD) tree building. Here we explore bootstrapping DNN AM training without GMM AMs and show that CD trees can be built with DNN alignments which are better ma...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

GMM-Free Flat Start Sequence-Discriminative DNN Training

نویسندگان

چکیده

منابع مشابه

Gaussian free cluster tree construction using deep neural network

Improving Computer Lipreading via DNN Sequence Discriminative Training Techniques

Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition

Joint adaptation and adaptive training of TVWR for robust automatic speech recognition

Gmm-free Dnn Training

عنوان ژورنال:

اشتراک گذاری